spirv-std functions to query compute builtins by fluffysquirrels · Pull Request #535 · Rust-GPU/rust-gpu

fluffysquirrels · 2026-02-20T13:48:40Z

Requires #540

This PR adds new API to query compute and subgroup builtins via getter functions:

new mod spirv_std::compute with builtins:
- local_invocation_index() -> u32
- local_invocation_id() -> UVec3
- global_invocation_id() -> UVec3
- num_workgroups() -> UVec3
- workgroup_id() -> UVec3
- missing gl_WorkgroupSize equivalent, see below
new builtins in spirv_std::subgroup:
- num_subgroups() -> u32
- subgroup_id() -> u32
- subgroup_size() -> u32
- subgroup_invocation_id() -> u32
- subgroup_*_mask() -> SubgroupMask
post-link pass to deduplicate Input OpVariables with Builtin decoration
- entry points must not have duplicate Builtins declared in the storageclasses (Input | Output | Output with PerPrimitive decoration)

about `gl_WorkgroupSize`

In spirv, there's a WorkgroupSize built-in which you have to apply not to an OpVariable like every other built-in but to an OpConstant, and applying built-ins to constants has been deprecated. Instead, they recommend that shader compilers look at the ExecutionMode LocalSize of their compute shader and just return that when gl_WorkGroupSize is read. glslc literally compiles a gl_WorkGroupSize read into %gl_WorkGroupSize = OpConstantComposite %v3uint %uint_1 %uint_2 %uint_3 without even decorating it with WorkgroupSize since that's been deprecated. But that's not trivial in rust-gpu, since we may have a single module with two compute shaders, and each compute shader may have a different workgroup size. So an fn workgroup_size() -> UVec3 intrinsic would need to be somehow specialized per shader, even if it is called through some non-generic function both shaders use. Possible solutions:

Force-inline all functions that may lead to workgroup_size() to specialize it, but since there's nothing passed to the function by reference, will likely be annoying to implement.
Have a global OpVaraiable that is filled with the value of WorkGroupSize by the entry point, and that intrinsic just reads it. Would require support for statics, which I'd honestly like to have for other custom use-cases as well.

I don't feel like we need to support this right away and can delay it's implementation.

old

This is a continuation of Firestar99's work in
#459 . I created a new PR only to have somewhere to show my changes.

A small problem: if an entry-point accepts a builtin as a parameter and code is generated for the get function for that parameter, then 2 globals are emitted decorated with the same builtin, and validation fails.

See the failing test compute_duplicates.

SPIR-V output:

OpDecorate %2 BuiltIn LocalInvocationIndex
OpDecorate %3 BuiltIn LocalInvocationIndex

compiletest error message:

error: error:0:0 - [VUID-StandaloneSpirv-OpEntryPoint-09658] OpEntryPoint contains duplicate input variables with Loca
lInvocationIndex builtin                                   
         %gl_LocalInvocationIndex = OpVariable %_ptr_Input_uint Input
   |
   = note: spirv-val failed
   = note: module `$TEST_BUILD_DIR/builtin/compute_duplicates.vulkan1.2`

error: aborting due to 1 previous error

For ease of use and porting to the new bulitin getters, I'd like to have this compile successfully by de-duplicating the global variable for the builtin. I think I see similar code in /crates/rustc_codegen_spirv/src/linker/duplicates.rs, is that the right way to handle it? I will try to write this.

I also considered building a map between builtin names and result ID's in the assembler and the entry-point codegen. I think that might run faster, but would have higher code complexity and need to touch more places.

Firestar99 · 2026-02-20T14:08:02Z

Give me a little bit more time to work on my version. Don't get me wrong, I think your version is fine, I'm just wondering if we can extract more information from the intrinsics.

Like global_invocation_id gives you a UVec3 that is known to be unique across all (= global) invocations. So if your workgroup was 1D (eg. compute(threads(32, 1, 1))) you can write to some output buffer safety using global_invocation_id().x, since you know every invocation has their own unique "slot" in the output buffer and there won't be any overlapping writes (which is UB). So we may be able to provide a safe API for many common write patterns, and ofc an unsafe escape hatch.

fluffysquirrels · 2026-02-20T14:38:51Z

I was also wondering about safe collection patterns without data races, the equivalent of rayon but for GPU collections. I considered making those newtype wrappers (from my first attempt) immutable and using them as a safe proof that an invocation really did own its slot in the collection.

That is still possible later even if spirv_std::builtin::* return these raw ID's such as UVec3 or u32; I was considering having the newtype wrappers just initialise themselves using these builtin getters.

fluffysquirrels · 2026-02-20T15:12:51Z

Here are some thoughts on safe GPU collections. Is there already a discussion thread or issue for these, or should we make one?

In my compute project I'm playing with implementing some standard algorithms, e.g. reduce, scan, filter, map. For coalesced access I'm using the normal pattern: each workgroup takes a slice of the total data, for each thread operates on 1 value at a time at index local_invocation_index = n * WG_LEN for iteration n.

I was thinking a workgroup-level data-race-free API could expose this as a map_into algorithm that takes a &mut [Output], inserts control barriers as necessary for the whole workgroup to finish.

For simple compute grids with multiple workgroups, the collection fn could consume the target collection to model it not being usable again until the whole grid finishes.

For prefix scan I'm implementing approximately StreamScan (best described IMO in NVIDIA's "Single-pass Parallel Prefix Scan with Decoupled Look-back" paper, if you don't know it). I use a pattern where each workgroup increments a global atomic to take the next sub-slice of data to work on, and increments another when complete. Once all blocks are assigned, workgroups spin-wait on the completed blocks count to reach the expected final count; this acts as a global barrier before proceeding to the next phase of the algorithm. This seemed sufficiently useful and general to extract into a re-usable algorithm. It would be fantastic for the API to mutably borrow the output collections during such a phase, hopefully guaranteeing data-race-freedom. I was thinking a phase could be modelled like a rayon scope.

fluffysquirrels · 2026-02-21T23:42:21Z

There's a bug in the new linker dedupe pass I'm half way through fixing.

fluffysquirrels · 2026-02-22T12:53:35Z

Linker builtin de-dupe bug fixed!

fluffysquirrels · 2026-02-22T23:04:04Z

Fixed the minor test errors on 4282ef0.

Firestar99 · 2026-02-23T12:26:33Z

I dumped my thoughts here: Rust-GPU/rust-gpu.github.io#96

TLDR:

But if there's a takeaway from this experiment, then that the uniqueness property of global_invocation_id is not worth preserving in the type system, as it is extremely difficult to take advantage of it. Instead, we should get on with implementing builtins as getter functions, and have fn global_invocation_id() just return a simple UVec3.

fluffysquirrels · 2026-02-23T13:53:08Z

Fixed a few lint errors in d6fc65b. 🤞

Firestar99 · 2026-02-23T14:26:11Z

I'm cherry-picking out the builtin dedup linker pass in a separate PR, squashed it down a bit and cleaned up the pass. Primarily, an OpVariable should only have one Builtin decoration. #539

It's just simpler to get a fix like that through reviews, and separate out the API design questions of the new builtin getters.

Firestar99 · 2026-02-23T15:40:06Z

Rebased it on #539, squashed it and fixed missing GLSL builtin names.

I'll take another look at this tomorrow and may just copy-paste together all the remaining built-ins.

fluffysquirrels · 2026-02-23T17:13:22Z

Cool.

fixed missing GLSL builtin names.

Great to have those. I didn't look for them in the GLSL extensions, oops!

Firestar99 · 2026-03-03T10:46:14Z

~~Interesting, jj did not keep the ownership of the commits~~

Or rather, github is displaying my pfp, even though the commit says it's your commits?

@fluffysquirrels FYI you're using a different commit email than the one listed on your profile, so github is marking these commits as made by an unknown contributor. Either change your commit email or add your commit @gmail.com address to your github settings.

Firestar99 · 2026-03-03T11:03:23Z

I'm gonna repurpose this PR to add just the compute and subgroup builtin getters, and worry about the graphics stuff in another PR. Which also means I'm merging #539 into this one, to have one complete working implementation, instead of splitting it up into two PRs.

…ex` to all-builtins; enable for more targets

… getter are used

Firestar99 · 2026-03-03T13:28:58Z

@fluffysquirrels I may have found an entirely new way to implement the linker fixup... The existing type dedup pass added the name (as in OpName) of the OpVariable to the dedup key. Removing it dedups any getter and attribute-based builtin declarations.

I think this should also be fine with other usages of OpVariable. The key includes all other decorations on the variable, so buffers should work just fine since they have DescriptorSet and Binding decorations, similarly vertex and fragment in / out have Location decorations (and In / Out too).

fluffysquirrels · 2026-03-03T16:08:07Z

crates/spirv-std/src/builtin.rs

+/// Query SPIR-V (read-only global) built-in values
+///
+/// See [module level documentation] on how to use these.
+#[macro_export]


I deliberately kept load_builtin! private as an implementation detail. As is, it's very unsafe. Also easy to make something that won't compile but will have an inscrutable error message, or won't validate.

I think this should remain an internal implementation detail.

fluffysquirrels · 2026-03-03T16:14:43Z

crates/spirv-std/src/compute.rs

Separating out all the builtins into modules in the root makes it a little harder to find them all. I think (especially for compute), someone using the subgroup builtins are also likely to use the compute builtins, so keeping these modules nested under crate::builtin makes sense to me.

In std (and other crates) I've seen people keep all the low level primitives / intrinsics in std::arch or similar. I think that pattern makes sense.

Any higher-level abstraction on top such as safe collections (alloc::Vec in std or whatever parallel collection for spirv here) could then have its own module in the root.

This won't work for the graphics builtins. Here's a message I sent internally last week about this:

Easy case: Some built-ins can be "inherited", eg. local_invocation_id from compute shaders are used in mesh shaders and ray gen shaders, cause they fundamentally function like augmented compute shaders. Will likely just define it once for compute shaders and you can just reuse them.

medium case: gl_PrimitiveID

vertex, tessellation & fragment shaders: an input

geometry shaders:

gl_PrimitiveIDIn: an input

gl_PrimitiveID: an output

mesh shaders:

gl_MeshPerPrimitiveEXT[].gl_PrimitiveID unsized per-primitive output array.

Hard case: gl_Position

vertex shader: an output variable that is written to via gl_Position

tessellation control | evaluation shaders:

gl_in[gl_MaxPatchVertices].gl_Position: a sized arrayed input variable you can read to get the Nth vertex position

gl_Position: an output variable you can write

geometry shader:

an unsized arrayed input variable you can read to get the Nth vertex position via gl_in.gl_Position

gl_Position: an output variable you can write

fluffysquirrels mentioned this pull request Feb 20, 2026

spirv-std functions to query builtins #459

Closed

Firestar99 marked this pull request as ready for review February 23, 2026 14:05

Firestar99 requested review from Firestar99, LegNeato and eddyb as code owners February 23, 2026 14:05

Firestar99 mentioned this pull request Feb 23, 2026

dedup BuiltIn OpVariable #539

Closed

Firestar99 force-pushed the spirv-std-builtin-fn branch from 94216a1 to 2a9576c Compare February 23, 2026 15:38

Firestar99 force-pushed the spirv-std-builtin-fn branch from 2a9576c to b62cd33 Compare February 24, 2026 09:34

Firestar99 mentioned this pull request Feb 25, 2026

cleanup module structure in spirv_std #540

Open

Firestar99 force-pushed the spirv-std-builtin-fn branch from b62cd33 to 08ccd9d Compare March 3, 2026 10:45

Firestar99 changed the title ~~spirv-std functions to query builtins (fluffy's fork)~~ spirv-std functions to query compute builtins Mar 3, 2026

Firestar99 changed the base branch from main to move_mod March 3, 2026 10:53

Firestar99 force-pushed the spirv-std-builtin-fn branch from 08ccd9d to 286d803 Compare March 3, 2026 11:25

Firestar99 and others added 4 commits March 3, 2026 12:35

builtin: add compute builtins, except workgroup_size

7a3a8d7

builtin: remove code links to workgroup_size, to be reverted later

970cd1b

builtin: add subgroup builtins

44502a4

dedup builtins compiletest: add missing builtin `local_invocation_ind…

b8a7704

…ex` to all-builtins; enable for more targets

Firestar99 force-pushed the spirv-std-builtin-fn branch 2 times, most recently from 0f1b95b to 708a6d6 Compare March 3, 2026 13:22

dedup OpVariable: add compiletests showing failure when both attr and…

6bcdbdc

… getter are used

Firestar99 force-pushed the spirv-std-builtin-fn branch from 708a6d6 to adcba8d Compare March 3, 2026 13:24

dedup OpVariable: remove OpVariable name from dedup key

f382cfb

Firestar99 force-pushed the spirv-std-builtin-fn branch from adcba8d to f382cfb Compare March 3, 2026 13:56

fluffysquirrels commented Mar 3, 2026

View reviewed changes

Conversation

fluffysquirrels commented Feb 20, 2026 • edited by Firestar99 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Requires #540

about gl_WorkgroupSize

old

Uh oh!

Firestar99 commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fluffysquirrels commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fluffysquirrels commented Feb 20, 2026

Uh oh!

fluffysquirrels commented Feb 21, 2026

Uh oh!

fluffysquirrels commented Feb 22, 2026

Uh oh!

fluffysquirrels commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Firestar99 commented Feb 23, 2026

Uh oh!

fluffysquirrels commented Feb 23, 2026

Uh oh!

Firestar99 commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Firestar99 commented Feb 23, 2026

Uh oh!

fluffysquirrels commented Feb 23, 2026

Uh oh!

Firestar99 commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Firestar99 commented Mar 3, 2026

Uh oh!

Firestar99 commented Mar 3, 2026

Uh oh!

fluffysquirrels Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

fluffysquirrels Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Firestar99 Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fluffysquirrels commented Feb 20, 2026 •

edited by Firestar99

Loading

about `gl_WorkgroupSize`

Firestar99 commented Feb 20, 2026 •

edited

Loading

fluffysquirrels commented Feb 20, 2026 •

edited

Loading

fluffysquirrels commented Feb 22, 2026 •

edited

Loading

Firestar99 commented Feb 23, 2026 •

edited

Loading

Firestar99 commented Mar 3, 2026 •

edited

Loading